189 research outputs found

    Snapshot isolation for transactional stream processing

    Get PDF
    Transactional database systems and data stream management systems have been thoroughly investigated over the past decades. While both systems follow completely different data processing models, the combined concept of transactional stream processing promises to be the future data processing model. So far, however, it has not been investigated how well-known concepts found in DBMS or DSMS regarding multi-user support can be transferred to this model or how they need to be redesigned. In this paper, we propose a transaction model combining streaming and stored data as well as continuous and ad-hoc queries. Based on this, we present appropriate protocols for concurrency control of such queries guaranteeing snapshot isolation as well as for consistency of transactions comprising several shared states. In our evaluation, we show that our protocols represent a resilient and scalable solution meeting all requirements for such a model

    Quality of Service and Predictability in DBMS

    Get PDF
    DBMS are a ubiquitous building block of the software stack in many complex applications. Middleware technologies, application servers and mapping approaches hide the core database technologies just like power, networking infrastructure and operating system services. Furthermore, many enterprise-critical applications demand a certain degree of quality of service (QoS) or guarantees, e.g. wrt. response time, transaction throughput, latency but also completeness or more generally quality of results. Examples of such applications are billing systems in telecommunication, where each telephone call has to be monitored and registered in a database, Ecommerce applications where orders have to be accepted even in times of heavy load and the waiting time of customers should not exceed a few seconds, ERP systems processing a large number of transactions in parallel, or systems for processing streaming or sensor data in realtime, e.g. in process automation of traffic control. As part of complex multilevel software stack, database systems have to share or contribute to these QoS requirements, which means that guarantees have to be given by the DBMS, too, and that the processing of database requests is predictable. Todays mainstream DBMS typically follow a best effort approach: requests are processed as fast as possible without any guarantees: the optimization goal of query optimizers and tuning approaches is rather to minimize resource consumption instead of just fulfilling given service level agreements. However, motivated by the situation described above there is an emerging need for database services providing guarantees or simply behave in a predictable manner and at the same time interact with other components of the software stack in order to fulfill the requirements. This is also driven by the paradigm of service-oriented architectures widely discussed in industry. Currently, this is addressed only by very specialized solutions. Nevertheless, database researchers have developed several techniques contributing to the goal of QoS-aware database systems. The purpose of the tutorial is to introduce database researchers and practitioners to the scope, the challenges and the available techniques to the problem of predictability and QoS agreements in DBMS

    What is in the KGQA benchmark datasets? Survey on challenges in datasets for question answering on knowledge graphs

    Get PDF
    Question Answering based on Knowledge Graphs (KGQA) still faces difficult challenges when transforming natural language (NL) to SPARQL queries. Simple questions only referring to one triple are answerable by most QA systems, but more complex questions requiring complex queries containing subqueries or several functions are still a tough challenge within this field of research. Evaluation results of QA systems therefore also might depend on the benchmark dataset the system has been tested on. For the purpose to give an overview and reveal specific characteristics, we examined currently available KGQA datasets regarding several challenging aspects. This paper presents a detailed look into the datasets and compares them in terms of challenges a KGQA system is facing

    Database as a service (DBaaS)

    Get PDF
    Modern Web or ÂżEternal-BetaÂż applications necessitate a flexible and easy-to-use data management platform that allows the evolutionary development of databases and applications. The classical approach of relational database systems following strictly the ACID properties has to be extended by an extensible and easy-to-use persistency layer with specialized DB features. Using the underlying concept of Software as a Service (SaaS) also enables an economic advantage based on the Âżeconomy of the scaleÂż, where application and system environments only need to be provided once but can be used by thousands of users. Within this tutorial, we are looking at the current state-of-the-art from different perspectives. We outline foundations and techniques to build database services based on the SaaS-paradigm. We discuss requirements from a programming perspective, show different dimensions in the context of consistency and reliability, and also describe different non-functional properties under the umbrella of Service-Level agreements (SLA)

    Big spatial data processing frameworks: feature and performance evaluation: experiments & analyses

    Get PDF
    Nowadays, a vast amount of data is generated and collected every moment and often, this data has a spatial and/or temporal aspect. To analyze the massive data sets, big data platforms like Apache Hadoop MapReduce and Apache Spark emerged and extensions that take the spatial characteristics into account were created for them. In this paper, we analyze and compare existing solutions for spatial data processing on Hadoop and Spark. In our comparison, we investigate their features as well as their performances in a micro benchmark for spatial filter and join queries. Based on the results and our experiences with these frameworks, we outline the requirements for a general spatio-temporal benchmark for Big Spatial Data processing platforms and sketch first solutions to the identified problems

    Putting Pandas in a Box

    Get PDF
    Pandas - the Python Data Analysis Library - is a powerful and widely used framework for data analytics. In this work we present our approach to push down the computational part of Pandas scripts into the DBMS by using a transpiler. In addition to basic data processing operations, our approach also supports access to external data stored in files instead of the DBMS. Moreover, user-defined Python functions are transformed automatically to SQL UDFs executed in the DBMS. The latter allows the integration of complex computational tasks including machine learning. We show the usage of this feature to implement a so-called model join, i.e. applying pre-trained ML models to data in SQL tables

    PatchIndex: exploiting approximate constraints in distributed databases

    Get PDF
    Cloud data warehouse systems lower the barrier to access data analytics. These applications often lack a database administrator and integrate data from various sources, potentially leading to data not satisfying strict constraints. Automatic schema optimization in self-managing databases is difficult in these environments without prior data cleaning steps. In this paper, we focus on constraint discovery as a subtask of schema optimization. Perfect constraints might not exist in these unclean datasets due to a small set of values violating the constraints. Therefore, we introduce the concept of a generic PatchIndex structure, which handles exceptions to given constraints and enables database systems to define these approximate constraints. We apply the concept to the environment of distributed databases, providing parallel index creation approaches and optimization techniques for parallel queries using PatchIndexes. Furthermore, we describe heuristics for automatic discovery of PatchIndex candidate columns and prove the performance benefit of using PatchIndexes in our evaluation

    10381 Summary and Abstracts Collection -- Robust Query Processing

    Get PDF
    Dagstuhl seminar 10381 on robust query processing (held 19.09.10 - 24.09.10) brought together a diverse set of researchers and practitioners with a broad range of expertise for the purpose of fostering discussion and collaboration regarding causes, opportunities, and solutions for achieving robust query processing. The seminar strove to build a unified view across the loosely-coupled system components responsible for the various stages of database query processing. Participants were chosen for their experience with database query processing and, where possible, their prior work in academic research or in product development towards robustness in database query processing. In order to pave the way to motivate, measure, and protect future advances in robust query processing, seminar 10381 focused on developing tests for measuring the robustness of query processing. In these proceedings, we first review the seminar topics, goals, and results, then present abstracts or notes of some of the seminar break-out sessions. We also include, as an appendix, the robust query processing reading list that was collected and distributed to participants before the seminar began, as well as summaries of a few of those papers that were contributed by some participants

    From natural language questions to SPARQL queries: a pattern-based approach

    Get PDF
    Linked Data knowledge bases are valuable sources of knowledge which give insights, reveal facts about various relationships and provide a large amount of metadata in well-structured form. Although the format of semantic information – namely as RDF(S) – is kept simple by representing each fact as a triple of subject, property and object, the access to the knowledge is only available using SPARQL queries on the data. Therefore, Question Answering (QA) systems provide a user-friendly way to access any type of knowledge base and especially for Linked Data sources to get insight into the semantic information. As RDF(S) knowledge bases are usually structured in the same way and provide per se semantic metadata about the contained information, we provide a novel approach that is independent from the underlying knowledge base. Thus, the main contribution of our proposed approach constitutes the simple replaceability of the underlying knowledge base. The algorithm is based on general question and query patterns and only accesses the knowledge base for the actual query generation and execution. This paper presents the proposed approach and an evaluation in comparison to state-of-the-art Linked Data approaches for challenges of QA systems
    • …
    corecore